Offline Imitation Learning with Suboptimal Demonstrations via Relaxed Distribution Matching
نویسندگان
چکیده
Offline imitation learning (IL) promises the ability to learn performant policies from pre-collected demonstrations without interactions with environment. However, imitating behaviors fully offline typically requires numerous expert data. To tackle this issue, we study setting where have limited data and supplementary suboptimal In case, a well-known issue is distribution shift between learned policy behavior that collects Prior works mitigate by regularizing KL divergence stationary state-action distributions of policy. We argue such constraints based on exact matching can be overly conservative hamper learning, especially when imperfect highly suboptimal. resolve present RelaxDICE, which employs an asymmetrically-relaxed f-divergence for explicit support regularization. Specifically, instead driving exactly match policy, impose little penalty whenever density ratio their upper bounded constant. Note formulation leads nested min-max optimization problem, causes instability in practice. RelaxDICE addresses challenge supporting closed-form solution inner maximization problem. Extensive empirical shows our method significantly outperforms best prior IL six standard continuous control environments over 30% performance gain average, across 22 settings dataset
منابع مشابه
Imitation Learning with Demonstrations and Shaping Rewards
Imitation Learning (IL) is a popular approach for teaching behavior policies to agents by demonstrating the desired target policy. While the approach has lead to many successes, IL often requires a large set of demonstrations to achieve robust learning, which can be expensive for the teacher. In this paper, we consider a novel approach to improve the learning efficiency of IL by providing a sha...
متن کاملInfoGAIL: Interpretable Imitation Learning from Visual Demonstrations
The goal of imitation learning is to mimic expert behavior without access to an explicit reward signal. Expert demonstrations provided by humans, however, often show significant variability due to latent factors that are typically not explicitly modeled. In this paper, we propose a new algorithm that can infer the latent structure of expert demonstrations in an unsupervised way. Our method, bui...
متن کاملImitation and Reinforcement Learning from Failed Demonstrations
Current work in robotic imitation learning uses successful demonstrations of a task performed by a human teacher to initialize a robot controller. Given a reward function, this learned controller can then be improved using techniques derived from reinforcement learning. We instead use failed attempts, which may be more plentiful, to initialize our controller and, taking them as illustrations of...
متن کاملLeft ventricle segmentation in MRI via convex relaxed distribution matching
A fundamental step in the diagnosis of cardiovascular diseases, automatic left ventricle (LV) segmentation in cardiac magnetic resonance images (MRIs) is still acknowledged to be a difficult problem. Most of the existing algorithms require either extensive training or intensive user inputs. This study investigates fast detection of the left ventricle (LV) endo- and epicardium surfaces in cardia...
متن کاملBeyond rational imitation: learning arbitrary means actions from communicative demonstrations.
The principle of rationality has been invoked to explain that infants expect agents to perform the most efficient means action to attain a goal. It has also been demonstrated that infants take into account the efficiency of observed actions to achieve a goal outcome when deciding whether to reenact a specific behavior or not. It is puzzling, however, that they also tend to imitate an apparently...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i9.26305